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Abstract  -  Visual  tracking  in  the  real  world  is  challeng¬ 
ing  with  unavoidable  background  interference,  target  ori¬ 
entation  variations  and  scale  changes.  Spatial  information 
needs  to  be  exploited  to  increase  robustness;  however,  cur¬ 
rent  methods  such  as  “ Spatiogram  ”  suffer  from  the  large 
complexity  of  spatial  covariance  calculation.  Recently,  joint 
distribution  representation  has  been  used  to  estimate  tar¬ 
get  orientation  and  scale,  but  this  representation  is  at  the 
expense  of  losing  position  localization  information.  A  new 
framework  is  proposed  for  target  model  representation  by 
employing  multiple  kernel  centers  (MKC)  within  the  kernel 
window.  By  employing  MKC,  spatial  information  is  implic¬ 
itly  embedded.  Steepest  gradient  ascent  is  used  to  track  the 
target  position,  orientation  and  scale  simultaneously.  Using 
an  adaptive  step  size  in  the  gradient  ascent  iteration,  the  pro¬ 
posed  method  inherits  the  desirable  properties  of  the  mean 
shift  approach  and  shows  a  fast  convergence  rate.  The  ex¬ 
perimental  results  in  several  challenging  scenarios  demon¬ 
strate  its  robustness  and  superiority  to  previous  technique. 
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1  Introduction 

Object  tracking  based  on  visual  features  such  as  color  and 
texture  have  great  flexibility  to  track  rigid  and  non-rigid  ob¬ 
jects.  Extensive  work  has  been  done  in  this  area  [1,  6,  5,  9], 
but  it  is  still  challenging  in  the  presence  of  background  inter¬ 
ference,  orientation  and  scale  changes,  which  usually  lead  to 
losing  the  targets.  For  a  recent  survey  of  object  track  meth¬ 
ods,  see  [12]. 

The  background- weighted  histogram  is  employed  to  se¬ 
lect  the  salient  parts  in  target  representation  [5].  This 
method  requires  precalculating  the  background  feature  rep¬ 
resentation  around  a  region  which  is  usually  much  larger 
than  the  target  area.  Higher-order  moments  in  target  repre¬ 
sentation  are  used  to  increase  the  robustness  in  tracking  [3]. 
Each  bin  in  the  feature  space  is  spatially  weighted  by  the 
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mean  and  covariance  of  the  locations  of  the  pixels  that  con¬ 
tribute  to  that  bin,  however,  the  calculation  of  the  mean  and 
covariance  is  a  burden  to  the  complexity. 

Multiple  kernels  are  used  by  introducing  the  roof  kernel 
[7,  8]  based  on  the  SSD  (sum  of  squared  differences)  mea¬ 
sure.  The  drawback  of  this  representation  is  that  it  tends  to 
bring  extra  noise  along  the  “roof”  direction.  Also,  this  ap¬ 
proach  is  not  as  efficient  as  the  mean  shift  method  due  to  the 
complexity  of  the  Newton-style  iterations  it  requires. 

Recently,  the  joint  distribution  representation  has  been 
used  by  employing  the  mean  shift  procedure  to  estimate  tar¬ 
get  position,  orientation  and  scale  simultaneously  [11].  One 
drawback  of  this  approach  is  that  the  capability  of  estimating 
target  orientation  is  at  the  expense  of  losing  localization  in¬ 
formation  in  the  target  representation.  When  the  joint  distri¬ 
bution  is  adopted,  the  kernel  function  assigns  smaller  weight 
to  the  pixels  farther  from  the  orientation  direction,  where 
the  pixels  are  valuable  for  target  representation.  Another 
problem  is  treating  scale  as  a  variable.  The  normalization 
factor  in  the  target  model  is  independent  of  the  kernel  cen¬ 
ter  and  orientation,  so  the  mean  shift  method  can  be  carried 
out,  however,  the  normalization  factor  depends  on  the  tar¬ 
get  scale  and  it  is  no  longer  a  constant  when  scale  is  treated 
as  a  variable,  so  employing  the  mean  shift  method  does  not 
guarantee  convergence  any  more. 

In  this  paper  we  propose  a  new  framework  for  target 
model  representation.  Multiple  kernel  centers  (MKC)  are 
employed  inside  the  kernel  window  to  form  an  augmented 
target  model.  The  resulting  MKC  model  contains  both  the 
orientation  and  scale  information,  which  is  not  possessed  by 
the  single  kernel  center  (SKC)  model.  Also,  spatial  con¬ 
strains  are  implicitly  embedded.  The  orientation  and  scale 
estimates  are  given  using  steepest  gradient  ascent.  By  em¬ 
ploying  an  adaptive  stepsize,  the  proposed  method  inherits 
the  desirable  property  of  the  mean  shift  algorithm  and  shows 
a  fast  convergence  rate.  The  main  contribution  is  that  the 
paper  gives  a  new  approach  for  building  target  appearance 
model  and  provides  target  location,  orientation  and  scale 
estimates  simultaneously  by  using  steepest  gradient  ascent 
with  an  adaptive  stepsize.  Comparisons  with  [11],  which  is 
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the  most  recent  algorithm  in  the  literature  that  can  handle 
target  location,  orientation  and  scale,  show  the  superiority 
of  our  new  approach. 

The  paper  is  organized  as  follows.  Section  2  presents  the 
MKC  model.  Section  3  presents  the  MKC  algorithm  with 
location  estimation  only.  Section  4  describes  the  MKC  al¬ 
gorithm  incorporated  with  orientation  and  scale  estimation. 
Section  5  shows  the  experimental  results.  Conclusions  are 
given  in  Section  6. 

2  Target  model 

We  shall  introduce  the  MKC  model  and  describe  the  nor¬ 
malization  issues  which  are  important  in  the  MKC  scenario. 

2.1  MKC  model 

Given  a  kernel  described  by  a  convex  and  a  monotonic  de¬ 
creasing  kernel  profile  k(x),  the  traditional  target  model  qu 
is  given  by 


qu  =  C^2k(\\^i\\2)Sx{xi)tU  (1) 

i= 1 

where  the  summation  is  over  the  pixels  in  the  target  re¬ 
gion  (assumed  to  have  been  segmented  by  an  operator  in 
the  initial  frame),  S  is  the  Kronecker  delta  function,  x  : 
7 Z2  — »  {1, . . . ,  m}  maps  the  pixel  at  location  to  the  quan¬ 
tized  feature,  u  is  an  element  of  the  finite  set  of  features 
{1, . . . ,  m}  and  C  is  the  normalization  constant  for  satisfy¬ 
ing  the  condition 

rri 

5>»  =  1  (2) 

U=  1 

and  is  given  by 


C  = 


EILiMIMI2) 

The  candidate  model  with  bandwidth  h  is  given  by 


pu(  y)  =  ch^Tk 


i=  1 


y-x* 


h 


(3) 


(4) 


where  y  is  both  the  centroid  of  the  target  region  and  the 
kernel  center.  The  normalization  constant  is 


YT^KV-^W2) 

This  model,  computed  with  a  single  kernel  center  (SKC)  at 
the  centroid,  has  limited  ability  in  delineating  targets  and 
the  resulting  mean  shift  procedure  using  this  SKC  model 
can  lead  to  localization  ambiguity  [8].  As  shown  in  Fig.  1, 
the  two  different  targets  cannot  be  discriminated  with  the 
SKC  target  model.  Note  that  the  concepts  of  region  cen¬ 
troid  and  kernel  center  are  different.  The  region  centroid  y 
represents  the  location  of  target  and  the  kernel  center  indi¬ 
cates  where  we  want  to  assign  large  weights  to  form  a  target 
model.  For  example,  we  want  to  put  the  kernel  center  on  a 
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Figure  1 :  SKC  target  model. 


salient  part  within  the  target  region  to  discriminate  the  back¬ 
ground  features  but  the  salient  part  is  not  necessarily  on  the 
region  centroid. 

The  idea  of  MKC  is  the  following:  the  locations  of  the 
region  centroid  and  the  kernel  center  can  be  different  and 
one  can  have  a  number  of  kernel  centers  to  impose  the  spa¬ 
tial  constraints  as  long  as  the  target  model  is  normalized.  We 
represent  the  kernel  center  r i  as  a  function  of  the  region  cen¬ 
troid  y,  rotated  angle  (j>  (counterclockwise)  and  bandwidth 
h  by 


r;(z)  =  y  +  hAri((f>) 

where  z  =  [y  4>  h]T  and 


Ar  i(<f>)  =  di 


cos(4>  +  tpi) 
sin  (<j)  +  ipi) 


(6) 

(7) 


l  represents  the  Ith  kernel  center  and  constants  di ,  ipi  are  its 
initial  distance  and  angle  in  polar  coordinates  with  respect 
to  the  centroid  y.  In  view  of  this,  the  MKC  model  can  be 
expressed  as 


N  L 

i  \  ^  \  ^ 


qu  =  CZ^2^k (llr'-Xill  )SXl(*i),* 


(8) 


i=  1  1=1 


N  L 

Pu{  z)  =  C{h)  EEfc 


i= 1  1=1 


r;(z)  -  Xj 

h 


$Xi(xi),u  (9) 


where  L  is  the  number  of  kernel  centers  used,  xi  '•  7£2  — > 
{[(/  —  1  )m  +  1], . . . ,  lm}  maps  to  the  quantized  feature 
which  is  calculated  from  the  Ith  kernel  center  and  u  is  an 
element  in  the  finite  set  {1, ... ,  Lm}.  For  the  convenience 
of  later  derivations,  we  substitute  N  for  n^,  where  N  rep¬ 
resents  the  number  of  all  the  pixels  in  a  given  frame.  This 
is  equivalent  to  the  original  form  because  the  pixels  outside 
the  kernel  window  do  not  contribute  to  the  model  and  N  is 
independent  of  h  in  this  notation.  Since  the  bandwidth  h  is 
treated  as  a  variable,  the  normalization  factor  is  a  function 
of  h ,  denoted  as  C(h).  Note  that  C(h)  is  independent  of  y 
and  cj)  given  the  kernel  centers.  Imposed  by  the  condition 
Yju= i  9u  =  1  and  Ylu= i  P«(z)  =  !.  c  and  c(h)  are  given 


by 


C  = 


E£i£f=i*(l|r  / -X.IP) 


(10) 
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Figure  2:  MKC  model  with  two  kernel  centers. 


C{h) 


1 


EN 

i=  1  2-4=1 


k 


rl  (z)  — 

h 


(ID 


Note  that  by  employing  MKC,  the  number  of  quantized 
features  m  remains  the  same  but  the  finite  set  used  to  de¬ 
lineate  the  target  model  is  augmented  from  {1, . . . ,  m}  to 
{1, . . . ,  Lm},  where  the  spatial  information  is  now  embed¬ 
ded  via  the  MKC.  We  use  the  same  example  but  add  another 
kernel  center  as  shown  in  Fig.  2.  The  two  targets  are  dis¬ 
criminated  by  the  MKC  model  which  embodies  spatial  con¬ 
straints. 


2.2  Scaled  radius 

An  ellipse  is  employed  here  to  represent  the  target  region. 
To  accommodate  the  kernel  profile  representation,  the  el¬ 
lipse  region  should  be  normalized  like  a  unit  circle  [5].  The 
normalized  distance  from  the  pixel  X*  =  [xi  Vi]T  to  the 
Ith  kernel  center  r /  =  [rx  ry]T  can  be  represented  as, 


_  ||Xi  -  r;|| 

M  R(e,h) 


(12) 


where  6  =  arctan  vxZ^rv  and  R(Q,  h )  is  the  scaled  radius 
from  the  kernel  center  to  the  pixel  on  the  ellipse  contour 
which  passes  through  with  angle  6  from  the  horizontal.  It 
can  be  shown  R(Q,h)  is  proportional  to  h  given  6  due  to  the 
geometry  similarity.  Therefore,  R(0 ,  h)  can  be  rewritten  as 

R(0,h)  =  Ro(6)h  (13) 


Figure  3:  Ellipse  Rotation 


where  </>'  =  —0  is  the  rotation  angle  (counterclockwise). 
Since  following  rotation,  the  distance  between  any  two 
points  remains  unchanged,  we  have 


R0(6)=R0{6')  (16) 

where  O'  =  arctan  anc*  A r[  =  [A r'x  A r'y]T  is 

the  relative  position  of  the  kernel  center  with  respect  to  y 
in  the  initial  ellipse.  Rewrite  the  initial  ellipse  in  the  polar 
coordinates  as 


x 

y 


Ro(0f) 


cos  6' 

sin#' 


From  (14)  and  (17),  we  obtain 


A  r’x  1 

A r'v  \ 


(17) 


Ro(0')  =  [— b2Ar'x  cos  O'  —  a2Ar'y  sin  O' 

+  (2a2b2Ar'xAr'y  sin  O'  cos  O'  +  a4b2  sin2  O'  +  a2b‘ 4 
•  cos2  O'  —  a2b2  cos2  0' Ar'y  —  a2b2  sin2  6'Ar'x)^ ] 
/(b2  cos2  0'  +  a2  sin2  O')  (18) 


Therefore,  the  normalized  distance  can  be  given  in  an  equiv¬ 
alent  form  by 


||  Ax'  —  feAr^Jj 

Ro(0')h 


(19) 


where  Ro(6)  is  the  scaled  radius  at  h  =  1.  The  goal  is  to 
find  Ro(0)  given  x^  and  r /. 

To  calculate  Ro(0)  of  a  current  ellipse  centered  at  y  = 
[ox  oy]T ,  we  rotate  the  ellipse  back  to  its  initial  position 
(Fig.  3).  Without  loss  of  generality,  we  assume  the  initial 
ellipse  ( h  =  1)  is  given  by 


(14) 


where  a,  b  are  the  semi-axes  along  the  x-axis  and  y-axis, 
respectively.  The  relative  position  Ax'  =  [ Ax[  A ?/']T  of 
the  pixel  x^  with  respect  to  y  after  rotation  is  given  by, 


A*'  ] 

A  y[  \ 

cos  0' 
sin  0' 


—  sin  <f>' 
cos  0' 


Xi  —  o 


x 


Vi  °y 


(15) 


Note  that,  given  the  kernel  centers  in  the  initial  ellipse,  Ar[ 
is  always  fixed.  In  particular,  for  Ar[  =  [0  0]T,  (19)  re¬ 
duces  to 


3  MKC  algorithm  with  location  esti¬ 
mation  only 

We  employ  the  MKC  model  to  form  the  similarity  function 
defined  by  the  Bhattacharyya  Coefficient  [5]  as 

Lm 

p(z)  =  Vpui^ku  (21) 

u=  1 
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To  illustrate  the  relationship  between  the  SKC  model  and 
the  MKC  model,  we  treat  <j>  and  h  as  constants  at  first.  To 
find  the  mode  of  the  similarity  function,  several  optimiza¬ 
tion  techniques  can  be  used.  However,  the  major  concern  is 
the  complexity  and  the  convergence  rate.  Since  evaluating 
the  Hessian  matrix  is  computationally  expensive,  we  employ 
the  steepest  gradient  ascent  to  construct  the  algorithm.  The 
crucial  issue  here  is  how  to  find  a  suitable  stepsize,  since  a 
stepsize  that  is  too  large  will  lead  to  divergence  and  a  step- 
size  that  is  too  small  will  result  in  slow  convergence.  The 
mean  shift  procedure  is  actually  a  gradient  ascent  method 
with  an  adaptive  stepsize  [4].  Therefore,  we  shall  investi¬ 
gate  the  mean  shift  stepsize  first. 


3.1  Mean  shift  stepsize 

Using  similar  notation  as  in  [4],  the  kernel  density  estimate 
(KDE)  is  given  by 


(22) 


where  c^d  is  the  normalization  constant,  d  is  the  dimension 
of  x,  and  k(x)  is  the  profile  of  kernel  K(x)  with  the  rela¬ 
tionship 

K(x)  =  cktdk(  ||x||2)  (23) 

Define  the  derivative 


where 

r/(y)  -  y  I  ('ll Ar I (0O )  (30) 


and 


C(ho) 


EL  Zm* 


ri(y)-Xj 

h0 


(31) 


The  linear  approximation  of  p(y)  defined  in  (21)  is  given  by 
[5], 


1  Lm  Lm  , - 

P(y)  *  2  E  Vp«(y°)9«*  2  £Pu(y) V  ^o)  (32) 

where  yo  is  the  initial  centroid  in  the  current  frame.  Taking 
the  gradient  of  (32)  with  respect  to  y  and  using  (29),  we 
obtain 


Vp(y)  =  ^£fvk 


i=  1  1=1 


My) 


hn 


where 


Lm 


m,i  =  Y 


Qu 


r[]l  Pu(  yo) 


Wi,l 

(33) 

(34) 


The  constraint  is  pu( yo)  >  0  and  the  color  features  should 
be  selected  such  as  to  satisfy  this  constraint.  Similarly  to 
(28),  the  MKC  stepsize  is  given  by 


g(x)  =  -k\x) 

Then  the  mean  shift  vector  is  given  by 

h 2 


m  hG  = 


cfh,c( x) 


VA,k(x) 


(24) 


(25) 


where  c  is  a  constant.  The  function  /^^(x)  is  the  KDE 
computed  with  the  kernel  G  by 


a’o<x)*5?£9 


where  G(x)  is  defined  as 


G(x)  =  c£/i<i9,(||x||2) 


(26) 


(27) 


From  (25)  we  can  see  that  the  mean  shift  stepsize  am  is 
given  by 


h 2 


,  ,  x  (28) 

CJh,G(X) 

Therefore,  in  the  regions  of  low-density  values,  am  is  large 
while  in  the  regions  near  the  local  maxima,  am  is  small  and 
the  search  more  refined. 


where 


c'(/io)Ez=i  My) 


N 

My)  =  YWi’i9 

i= 1 


My)  -  Xj 
ho 


(35) 


(36) 


Since  fi(y)  is  the  weighted  KDE  calculated  from  the  Zth 
kernel  center,  fi( y)  can  be  interpreted  as  a  mixture  of 

the  estimated  probability  density  characterized  by  multiple 
kernel  centers.  Mixture  probability  density  functions  (pdf) 
are  widely  used  in  parametric  estimation  techniques,  such 
as  Gaussian  mixture.  The  sum  J2i=i  fi( y)  can  be  viewed 
as  the  counterpart  of  the  mixture  pdf  in  the  nonparametric 
estimation  case  (as  in  the  KDE).  Note  that  we  omit  the  nor¬ 
malization  constants  in  fi{y)  and  Yli= i  fi( y)»  which  are  in¬ 
dependent  of  y  given  the  kernel  type.  Therefore,  the  MKC 
stepsize  defined  by  (35)  is  adaptive,  which  makes  it  possess 
the  same  desirable  property  as  the  mean  shift  stepsize. 

Employing  the  steepest  gradient  ascent  by 

yj+i  =  yj  +  a1S7p{  y3)  (37) 


3.2  MKC  stepsize 

Now  consider  our  problem  based  on  MKC  model.  Since  0 
and  h  are  treated  as  constants,  (9)  reduces  to 


N 


Pu 


(y)  =  c(h0)  Y  Y  k 


i=  1  1=1 


My)  -■ 


(29) 


and  substituting  (35)  for  a4,  the  MKC  algorithm1  is  given 
(after  some  algebraic  manipulations)  by 

,-+i  -  hoAMM)H,igii 

y  = - - i - -  (38> 

Ei= i  Ei=i 

1  At  this  stage  there  is  no  orientation  and  scale  estimation,  which  will  be 
added  in  Section  4. 
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where  g\x  represents  g( ||  x*||2)  for  short.  Note  that 

yJ  cancels  out  by  using  this  adaptive  stepsize. 

For  convergence  analysis,  we  give  the  proposition  below. 
The  assumptions  are  that  the  linear  approximation  given  by 
(32)  is  satisfactory  and  there  is  at  least  one  nonzero  w^i  for 
each  iteration,  which  are  most  often  valid  assumptions  be¬ 
tween  consecutive  frames. 

Proposition.  If  the  kernel  K  has  a  convex  and  a  mono- 
tonically  decreasing  profile,  the  sequences  {y^}  given  by 
(38)  converge  to  y*,  where  p(y*)  is  the  local  maximum  of 
the  similarity  function  defined  by  the  Bhattacharyya  Coeffi¬ 
cient. 

The  proof  is  given  in  the  Appendix.  Therefore,  conver¬ 
gence  to  y*  of  the  MKC  Algorithm  is  guaranteed  by  us¬ 
ing  the  MKC  stepsize  given  in  (35)  for  fixed  orientation  and 
scale. 


3.3  Relationship  to  mean  shift  procedure 

Consider  a  special  case  that  all  the  kernel  centers  are  over¬ 
lapped  on  the  centroid,  which  means  Arj(0o)  =  0.  Then, 
(38)  reduces  to 


yi+!  = 


Ei=i 


I  h 


(39) 


where  Wi  =  Lw^u  since  w^i  and  g\x  are  independent  of 
l  in  this  case  due  to  the  same  kernel  center.  We  can  see 
that  (39)  is  exactly  the  mean  shift  procedure  which  uses  the 
SKC  model  described  in  [5].  In  this  case,  the  MKC  model 
contains  the  same  information  as  the  SKC  model,  so  the  two 
algorithms  give  the  same  result. 

Since  the  MKC  algorithm  given  by  (38)  possesses  all  the 
properties  of  mean  shift  procedure,  such  as  adaptive  step- 
size  and  guaranteed  convergence,  we  can  draw  the  following 
conclusion:  the  mean  shift  procedure  is  a  special  case  of  the 
MKC  algorithm  with  a  single  kernel  center  at  the  centroid. 


linear  approximation,  the  gradient  of  p( z)  defined  in  (21)  is 
given  by 


Vp(z) 


Lm 


E 


2Vp«(z) 


Vp„(z) 


(40) 


Since  the  orientation  and  scale  are  not  constants  any  more, 
stepsize  selection  is  necessary  in  this  case.  We  employ 
Armijo  rule  [2]  considering  its  efficiency  and  simplicity. 
The  initial  stepsize  a0  is  a  critical  parameter  for  the  conver¬ 
gence  rate.  In  view  of  this,  it  is  natural  to  use  the  adaptive 
stepsize  given  by  (35)  to  serve  as  the  initial  value  a°  ,  which 
is  given  by 


C(h)  £f=i  fi(z) 


(41) 


where 


N 

Mz)  =  EWi-'5 


r;(z)  -Xj 
h 


(42) 


and 


Wi,i  = 


5 


Xl(^i),u 


(43) 


The  Armijo  Rule  stepsize  is  given  by  ot?  —  /3n  a0  ,  where 
n'  is  the  first  nonnegative  integer  n  that  satisfies, 


p( zJ+1)  —  p( zj)  >  Xaj  ||  Vp(zJ)||2/72  (44) 


where  A,  /3  are  fixed  scalars  satisfying  0<A<l,0</?< 
1.  The  choice  of  f3  is  usually  from  0.1  to  0.5  [2].  A  compen¬ 
sation  factor  7  is  needed  here  since  the  distance  ||  r^z]~x'  || 
has  been  normalized  by  the  scaled  radius  discussed  in  Sec¬ 
tion  2.2.  An  approximate  value  is  given  by  7  =  min(a,  b). 
Therefore,  the  increment  ot?  Vp(zJ  )  of  the  MKC  algorithm 
can  be  obtained  (after  some  algebraic  manipulations)  as 


Ay7  =  (31 


Eti  ztMdi 


(45) 


4  Incorporation  of  orientation  and 
scale  estimation  into  the  MKC  algo¬ 
rithm 


Aft  =  ft 


'  EiE  E7 
Eti  yLMAi 


For  robust  tracking,  the  target  region  used  to  delineate  the 
target  should  be  as  precise  as  possible  to  reject  the  non¬ 
object  regions.  Therefore,  orientation  and  scale  are  impor¬ 
tant  parameters  to  be  estimated.  Most  of  the  existing  ap¬ 
proaches  restrict  themselves  to  the  mean  shift  framework 
and  suffer  from  either  heuristics  or  large  complexity.  Since 
the  MKC  model  contains  both  orientation  and  scale  infor¬ 
mation,  we  will  use  it  to  estimate  the  orientation  and  scale. 


A  hj  =  f37 


>E-LiEEi7K;-p(zJ)]</ 
~  W  £f=1  Ef=1  <i9h 


where, 


vh  =  (xi  -  yJ) 


■  T  dAri(ft) 


d(t> 


(46) 


(47) 


(48) 


4.1  Orientation  and  scale  estimation 

We  employ  steepest  gradient  ascent  to  optimize  target  lo¬ 
cation,  orientation  and  scale  simultaneously.  Following  the 
procedures  discussed  in  Section  3.2  but  without  using  the 


sii  =  (x*  -  yJ)T(xi  -  r/(. zft)  (49) 

and  gj t  represents  g(\\  |j2)  for  short.  Note  that,  un¬ 

like  (34),  wj  [  should  be  updated  as  in  (43)  for  each  iteration 
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and  pu( z-7)  cannot  be  0  in  this  case  since  it  is  calculated 
from  the  updated  target  region  in  each  iteration.  Equations 
(45)-(47)  are  carried  out  iteratively  until  ||yJ*+1  —  y-7  ||  < 
ey,  —  <ft\\  <  £<$>,  || hi+1  —h? :||  <  eh  are  satisfied  and 

ey  is  chosen  to  satisfy  that  yJ+1  and  y-7  are  within  the  same 
pixel. 

Some  insight  can  be  obtained  for  the  iterations  given 
above.  For  n'  =  0,  it  can  be  shown  that  the  centroid  iter¬ 
ation  given  by  (45)  is  the  same  as  (38)  except  that  w\l  is 
updated  for  each  iteration.  If  all  the  kernel  centers  are  at  the 
region  centroid,  which  indicates  Ar i(4>)  =  0,  (46)  yields 
(j)i+1  =  (j)i  m  Therefore,  the  orientation  information  is  not 
available  in  this  case.  Also  for  Ar i{(jP)  =  0,  (49)  reduces 
to  =  ||xi  —  y-7 1|2.  The  norm  ||x*  —  yJ  ||  is  the  distance 
between  the  pixel  and  the  centroid.  The  average  distance 
of  all  the  pixels  within  the  target  region  represents  the  target 
scale.  Since  p( z-7  )  is  independent  of  i  and  l,  (47)  is  actually 
computing  the  difference  between  an  unweighted  scale  and 
a  weighted  scale  characterized  by  the  weight  w\  t . 

Given  two  consecutive  frames,  the  variations  of  </>  and  h 
are  always  limited  and  this  can  be  utilized  to  improve  the 
tracking  performance  in  the  high  clutter  environment.  For 
some  threshold  A  hmax  and  A0max,  a  feasible  solution  is 
given  by 


% 

+ 

II 

1  ¥+1 

if  \¥+1 

5^  A0max 

(50) 

1  ¥ 

if  \¥+1 

_  (t>p\ 

>  A0max 

^■+1  = 

(  hX1 
\  h? 

if  \hi+l 
if  jW+1 

1  1 

*0  ^3 

^  A/zmax 
>  Afomax 

(51) 

where  hp 

are  from  the  previous  frame.  A  default  value 

for  A fomax  is  0.1  hp  and  A0max  is  usually  application  de¬ 
pendent. 

4.2  MKC  implementation 

In  an  ideal  scenario,  without  occlusion  or  background  in¬ 
terference,  the  performance  given  by  the  MKC  algorithm 
should  be  at  least  no  worse  than  the  mean  shift  method  due 
to  the  spatial  information  considered.  However,  this  is  not 
necessarily  true  in  real  applications.  Some  important  is¬ 
sues  should  be  taken  into  consideration  before  employing 
the  MKC  algorithm. 

Kernel  center  selection  in  MKC  algorithm :  Intuitively, 
kernel  centers  should  be  far  away  from  each  other  to  provide 
more  discrimination  in  the  target  model;  however,  the  noise 
may  increase  as  the  kernel  center  is  away  from  the  centroid, 
since  occlusion  or  background  interference  often  occurs  in 
the  peripheral  pixels.  Therefore,  the  kernel  centers  should 
be  restricted  to  some  region  around  the  centroid.  From  most 
situations,  the  maximum  distance  to  the  centroid  for  the  ker¬ 
nel  center  should  be  no  more  than  1/3  of  the  minor  axis. 

Parallel  MKC  (PMKC)  algorithm :  Compared  to  the  noise 
in  the  SKC  model,  if  the  occlusion  or  background  interfer¬ 
ence  is  near  the  “perigee”  of  the  kernel  ellipse  (with  respect 
to  the  kernel  center),  the  noise  in  the  MKC  model  is  larger; 
if  it  is  near  the  “apogee”  of  the  kernel  ellipse,  the  noise  in 
the  MKC  model  is  smaller.  In  view  of  this,  we  propose  a 


Figure  4:  Comparison  of  mean  shift  and  MKC  algorithm. 


procedure  called  PMKC  algorithm: use  two  sets  of  MKC  on 
opposing  sides  within  the  kernel  window  and  run  the  two 
MKC  algorithms  in  parallel.  The  best  result  which  yields 
the  largest  Bhattacharyya  coefficient  is  retained  .  Though 
the  computational  cost  is  a  little  higher,  it  yields  a  very  ro¬ 
bust  tracking  performance. 

5  Experimental  results 

The  RGB  color  space  is  quantized  into  16  x  16  x  16  bins.  An 
Epanechnikov  profile 


k(pc)  = 


^cd  1(d  +  2)(1  —  x)  if  x  <  1 
0  otherwise 


(52) 


is  employed,  where  Cd  is  the  unit  volume  of  d-dimensional 
(2  in  our  case)  sphere.  The  MKC  algorithm  is  employed  by 
using  two  kernel  centers  through  all  the  experiments.  One 
kernel  center  is  on  the  centroid  and  the  other  one  is  on  the 
axis.  Since  two  different  algorithms  are  described  in  Section 
3  (MKC  algorithm  with  location  estimation  only)  and  Sec¬ 
tion  4  (MKC  algorithm  with  location,  orientation  and  scale 
estimation)  respectively,  the  experimental  results  are  given 
in  two  parts. 


5.1  Localization  with  fixed  orientation  and 
scale 

The  performance  of  the  MKC  algorithm  given  by  (38)  for 
fixed  orientation  and  scale,  shown  in  the  bottom  row  of 
Fig.  4,  is  compared  with  the  mean  shift  algorithm  (top  row 
of  Fig.  4).  For  ey  =  0.7  (in  image  coordinates),  the  average 
number  of  iterations  is  about  3  for  both  algorithms.  We  can 
see  the  mean  shift  algorithm  yielded  ambiguity  in  the  local¬ 
ization  due  to  the  background  interference  while  the  MKC 
algorithm,  due  to  its  use  of  two  kernel  centers,  tracked  the 
target  correctly. 


5.2  Localization  with  orientation  and  scale 
estimation 

Next,  we  give  the  experimental  results  of  the  MKC  algo¬ 
rithm  given  by  (45)-(47)  with  orientation  and  scale  esti¬ 
mation.  The  target  region  (bottom  row  of  Figs.  5-9)  is 
marked  by  an  ellipse  with  a  (green)  line  across  it  represent¬ 
ing  the  orientation.  We  compare  the  MKC  algorithm  with 
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Figure  6:  Human  Walking  Sequence-2.  Frames:  1,  50,  60,  Figure  8.  Box  Sequence.  Frames.  1,  20,  30,  40,  50,  60 
74,  150,  170. 


the  method  of  [11]  (top  row  of  Figs.  5-9).  The  latter  ap¬ 
plies  the  mean  shift  algorithm  to  a  4D  kernel,  namely 

K{x,  y, ,  a,  9)  =  K(x,  y)K{a)K(0)  (53) 

where  K(x,y)  is  the  spatial  kernel,  K (a)  is  the  scale  kernel 
and  K  (6)  is  the  orientation  kernel.  The  thresholds  (for  both 
algorithms)  are  chosen  as  ey  =  0.7,  =  0.01  rad.  and 

6h  =  0.01.  For  the  MKC  algorithm,  the  average  number 
of  iterations  is  about  3  and  the  average  number  of  Armijo 
Rule  iterations  is  about  2,  while  for  the  algorithm  of  [11], 
the  average  number  of  iterations  is  about  8. 

In  Fig.  5  the  person  in  the  sequence  walked  quickly  to¬ 
wards  the  camera,  which  resulted  in  fast  scale  changes.  Both 
methods  handled  the  scale  changes  very  well.  In  Fig.  6  the 
target  underwent  large  changes  in  both  scale  and  orientation. 
The  MKC  algorithm  tracked  the  target  well  while  the  algo¬ 
rithm  of  [11]  failed  to  estimate  the  orientational  changes 
and  “took”  non-object  regions  into  the  kernel  window.  In 
Fig.  7  the  scenario  is  even  more  challenging  with  strong 
background  interference.  The  MKC  algorithm  kept  the  tar¬ 
get  in  track  throughout  the  sequence.  The  tracking  perfor¬ 
mance  of  the  algorithm  of  [11]  degraded  drastically  after 
the  background  interference  arose  and  its  scale  estimate  di¬ 
verged  at  the  end  of  the  sequence. 

In  the  Box  sequence  (Fig.  8)  and  Pink  Cup  sequence 
(Fig.  9),  the  tracker  was  tested  for  fast  orientational  changes. 
In  Fig.  8  the  average  rotational  speed  was  about  6°/frame 
and  the  maximum  rotational  speed  was  about  14°/frame. 
The  MKC  algorithm  successfully  tracked  these  fast  orien¬ 
tational  changes.  The  algorithm  of  [11]  lost  the  target.  In 
Fig.  9  we  added  the  background  interference  (pink,  similar 
to  the  cup)  and  employed  the  PMKC  (Parallel  MKC  intro¬ 
duced  in  Section  4.2)  algorithm  with  the  results  shown  in 


the  middle  row.  The  average  rotational  speed  was  about 
ll°/frame  and  the  maximum  rotational  speed  was  about 
18°/frame.  The  PMKC  tracker  outperformed  the  MKC 
tracker  in  this  particularly  difficult  scenario  with  fast  rota¬ 
tion  and  background  interference.  The  performance  of  [1 1] 
was  the  worst. 

6  Summary  and  conclusions 

This  paper  presented  a  new  framework  for  target  model  rep¬ 
resentation  based  on  multiple  kernel  centers  (MKC).  Com¬ 
pared  to  the  traditional  model  computed  with  a  single  ker¬ 
nel  center  at  the  centroid  (SKC),  the  MKC  model  is  more 
flexible  in  the  target  representation  and  more  robust  due  to 
the  spatial  information  it  carries.  The  orientation  and  scale 
estimates  are  exploited  from  the  MKC  model  by  employ¬ 
ing  steepest  gradient  ascent.  The  proposed  MKC  algorithm 
and  the  mean  shift  approach  have  in  common  an  adaptive 
stepsize  rule,  which  results  in  a  fast  convergence  rate.  The 
parallel  MKC  algorithm  was  also  introduced  and  shown  to 
improve  the  tracking  performance  drastically. 

Appendix 

Proposition.  If  the  kernel  K  has  a  convex  and  a  mono- 
tonically  decreasing  profile,  the  sequences  {y^}  given  by 
(38)  converges  to  y*,  where  p(y*)  is  the  local  maximum  of 
the  similarity  function  defined  by  the  Bhattacharyya  Coeffi¬ 
cient. 

Proof:  From  (29)  and  (32),  we  have 

N  L 

p(y3+1)  -p(y3)  =  ~c(ho)J2J2wi,i 

i=l  1=1 

X 


r;(yi+1)  -Xi 


h0 


r/(yj) 


h0 
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Figure  9:  Pink  Cup  Sequence.  Frames:  1,  8,  35,  80,  125, 
170. 


(54) 


Since  the  kernel  profile  k{pc)  is  convex,  the  inequality 

k(x2)-k(x  i)  >  g(x  i)(xi-x2)  (55) 

holds,  where  g(x)  =  —k'(x).  Therefore,  (54)  becomes, 


p(y3+1)  -  p{y3)  >  -y°)T 

N  L 

‘EE^  ~  h0Ari 

i= 1  1=1 


,  Cjhp) 

2  hi 


(Ill'll2 


JV 


^'+1II2 


>EE 

*=i  ;=i 


Wi,Wi,iJ  (56) 


by  recalling  (30).  Using  the  iterations  given  by  (38),  we 
obtain 


p{y3+1)  -  p(yJ)  > 


Cjhp) 

2ft,Q 


lly 


3+ 1  _ 


N 


j-3  I!2 


EE 

i=l  1=1 


Wi,i9iX 


(57) 


Since  profile  k{pc)  is  monotonically  decreasing  for  all  x  >  0 
and  the  weight  w^i  is  nonnegative,  the  right  term  of  (57)  is 
always  positive  as  long  as  yJ+1  ^  yJ  (at  least  one  nonzero 
Wi:i  by  assumption).  Therefore,  p( yJ  )  is  monotonically  in¬ 
creasing  for  yJ+1  yf  Since,  p( y)  is  bounded  by  1,  the 
sequence  {p(yJ)}  converges  to  its  local  maxima  p( y*)  for 
yj+1  =  y1  =  y*.  Q.E.D. 
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